Today's Question:  What does your personal desk look like?        GIVE A SHOUT

SEARCH KEYWORD -- Source map



  Using MemoryMappedBuffer to handle large file in Java

When handling large files, it will largely affect the process speed while using traditional FileInputStream, FileOutputStream or RandomAccessFile since they trigger lots of read and write operations. In Java NIO, a new way of handling large file is introduced which is to use MmeoryMappedBuffer to create memory mapped file. Memory-mapped I/O uses the filesystem to establish a virtual memory mapping from user space directly to the applicable filesystem pages. With a memory-mapped file, you can pre...

   JAVA,IO,NIO     2015-11-13 01:58:08

  Why are column oriented databases so much faster than row oriented databases?

I have been playing around with Hybrid Word Aligned Bitmaps for a few weeks now, and they turn out to be a rather remarkable data structure.  I believe that they are utilized extensively in modern column oriented databases such as Vertica and MonetDB. Essentially HWABs are a data structure that allows you to represent a sparse bitmap (series of 0's and 1's) really efficiently in memory.  The key trick here is the use of run length encoding to compress the bitmap into fe...

   Database,Column oriented,Speed analysis,Vertica     2012-01-29 04:27:05

  A Month With Scala

Although I’ve played around with Scala for the few months, these efforts largely involved simple scripts and casual reading. It wasn’t until last month that the opportunity to use Scala in a large scale project finally arose and I dove right in. The project was a typical REST based web service built on top of Amazon’s Elastic Beanstalk, SimpleDB, S3 and Redis*. First off let’s talk about why I chose Scala in the first place. After spending a good deal of my las...

   Scala,Functional,OOP,Java,Iteration     2011-12-10 06:03:23

  A list of different CAPTCHA designs

Here is a list of website CAPTCHA designs which demonstrate all kinds of weird verification methods. By looking at these designs, as a website designer, you should distinguish which design is accessible and which is not.Is this human readable?What characters are in the picture?One moreWe will be crazy if we see theseOMG,What's the answer?IQ Test?Are you a normal person?You know how blind people read?ASCII PictureAre you an adult?3D verification codereCaptchaIt increase a new feature recently. It...

   CAPTCHA,Website design     2012-07-19 11:51:06

  Java Sequential IO Performance

Many applications record a series of events to file-based storage for later use.  This can be anything from logging and auditing, through to keeping a transaction redo log in an event sourced design or its close relative CQRS.  Java has a number of means by which a file can be sequentially written to, or read back again.  This article explores some of these mechanisms to understand their performance characteristics.  For the scope of this article I will be using pre-a...

   Java,IO,Sequential,Blocking     2012-02-23 07:09:10

  Tom Uglow from Google : 5 steps to innovation

Google has been known for its innovation In Google there is a "welfare": each employee can spare 20% of his/her time to do what he/she likes to do so that any idea has a chance to be turned into reality. Perhaps this freedom makes Google capable of introducing new products and new ideas continuously. Google China held a small discussion session in its office in Tsinghua Science Park recently, Google Creative Director Tom Uglow shared some experience and cases of Google in production and innovat...

   Innovation,Technology,Google     2012-10-10 20:00:47

  CASSANDRA data model

Cassandra is an open source distributed database, it combines dynamic key/value and column oriented feature of Bigtable. Features of Cassandra are: Flexible schema, no need to design schema first, it's very convenient to add or delete strings Support range search on keys High usability, extensible. The single node error will not affect the cluster. We can think Cassandra's data model as a 4 or 5 dimensional Hash. COLUMN Columns is the smallest data unit in Cassandra, it is a 3 dimensional data...

   Cassandra,database,sort     2013-06-08 22:07:40

  Understanding lvalues and rvalues in C and C++

The terms lvalue and rvalue are not something one runs into often in C/C++ programming, but when one does, it’s usually not immediately clear what they mean. The most common place to run into these terms are in compiler error & warning messages. For example, compiling the following with gcc: int foo() {return 2;} int main() { foo() = 2; return 0; } You get: test.c: In function 'main': test.c:8:5: error: lvalue required as left operand of assignment True, this code ...

   lvalue,rvalue,C++,locator value,elaboration     2011-12-15 07:51:38

  Cleansing data with Pig and storing JSON format to HBase with Pig UDF

Introduction This post will explain you the way to clean data and store JSON format to HBase. Hadoop architect experts also explain Apache Pig and its advantages in Hadoop in this post. Read more and find out how they do it. This post contains steps to do some basic clean the duplication data and convert the data to JSON format to store to HBase. Actually, we have some built-in lib to parse JSON in Pig but it is important to manipulate the JSON data in Java code before store to HBase. Apache Pig...

   JSON,HADOOP ARCHITECT,APACHE HBASE,PIG UDF     2016-06-10 01:13:41

  When will resizing be triggered in Java HashMap?

HashMap is one of the most frequently used collection types in Java, it stores key-value pairs. Ideally it expects to use hash table which expects the data access time complexity to be O(1), however, due to hash conflicts, in reality, it uses linked list or red-black tree to store data which makes the worst case time complexity to be O(logn).  Although collections are using data structures like arrays and linked lists, unlike arrays, they will dynamically resize when there is not enough spa...

   JAVA,RESIZE,HASHMAP,THRESHOLD     2020-05-02 20:41:19